Performance Measurement of Applications with GPU Acceleration using CUDA

نویسندگان

  • Shangkar Mayanglambam
  • Allen D. Malony
  • Matthew J. Sottile
چکیده

Multi-core accelerators offer significant potential to improve the performance of parallel applications. However, tools to help the parallel application developer understand accelerator performance and its impact are scarce. An approach is presented to measure the performance of GPU computations programmed using CUDA and integrate this information with application performance data captured with the TAU Performance System. Test examples are shown to validate the measurement methods. Results for a case study of the GPU-accelerated NAMD molecular dynamics application application are given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

Implementation of String Match Algorithm BMH on GPU Using CUDA

String match algorithm is widely used in the area of data mining. In this paper, we present an approach for elevating the performance of this algorithm via GPU (Graphic Processing Unit). With the rapid development of Graphics Processing Unit to many-core multiprocessors, it shows great potential in many applications and high performance computing. Especially, the heterogeneous architecture CPU+...

متن کامل

Landau gauge fixing on the lattice using GPU ’ s

In this work, we consider the GPU implementation of the steepest descent method with Fourier acceleration for Laudau gauge fixing, using CUDA. The performance of the code in a Tesla C2070 GPU is compared with a parallel CPU implementation.

متن کامل

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

GPU-Accelerated Computation for Robust Motion Tracking Using the CUDA Framework

In this paper, we discuss our implementation of a graphics hardware acceleration of the Vector Coherence Mapping vision processing algorithm. Using this algorithm as our test case, we discuss our optimization strategy for various vision processing operations using NVIDIA’s new CUDA programming framework. We also demonstrate how flexibly and readily vision processing algorithms can be mapped ont...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009